Search CORE

42 research outputs found

Discovering Latent Information By Spreading Activation Algorithm For Document Retrieval

Author: Ngo Vuong M.
Publication venue
Publication date: 29/07/2018
Field of study

Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do not contain the keywords but contain information related to the query are not retrieved. Spreading activation is an algorithm for finding latent information in a query by exploiting relations between nodes in an associative network or semantic network. However, the classical spreading activation algorithm uses all relations of a node in the network that will add unsuitable information into the query. In this paper, we propose a novel approach for semantic text search, called query-oriented-constrained spreading activation that only uses relations relating to the content of the query to find really related information. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8% respectively better than the syntactic search and the search using the classical constrained spreading activation. KEYWORDS: Information Retrieval, Ontology, Semantic Search, Spreading ActivationComment: 12pages, will be published in The International Journal of Artificial Intelligence & Applications (IJAIA). arXiv admin note: text overlap with arXiv:1807.0796

arXiv.org e-Print Archive

A Similarity Measure for Weaving Patterns in Textiles

Author: Helmer Sven
Ngo Vuong M.
Publication venue
Publication date: 10/10/2018
Field of study

We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of k-neighborhoods from these graphs. The resulting multisets are then compared using Jaccard coefficients, Hamming distances, and cosine measures. We evaluate the different variants of our similarity measure experimentally, showing that it can be implemented efficiently and illustrating its quality using it to cluster and query a data set containing more than a thousand textile samples.Comment: 10 papes, will be published in SIGIR 201

arXiv.org e-Print Archive

Semantic Search by Latent Ontological Features

Author: Cao Tru H.
Ngo Vuong M.
Publication venue
Publication date: 15/07/2018
Field of study

Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.Comment: 17 pages, Accept by New Generation Computing (2012

arXiv.org e-Print Archive

Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search

Author: Cao Tru H.
Le Tuan M. V.
Ngo Vuong M.
Publication venue
Publication date: 20/07/2018
Field of study

Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. We propose an ontology-based generalized Vector Space Model to semantic text search. It exploits ontological features of named entities and WordNet words, and develops a query-oriented spreading activation algorithm to expand queries. In addition, it combines and utilizes advantages of different ontologies for semantic annotation and searching. Experiments on a benchmark dataset show that, in terms of the MAP measure, our model is 42.5% better than the purely keyword-based model, and 32.3% and 15.9% respectively better than the ones using only WordNet or named entities. Keywords: semantic search, spreading activation, ontology, named entity, WordNet.Comment: 6 papes, Accepted by RIVF. arXiv admin note: substantial text overlap with arXiv:1807.05579; text overlap with arXiv:1807.0557

arXiv.org e-Print Archive

WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

Author: Cao Tru H.
Le Tuan M. V.
Ngo Vuong M.
Publication venue
Publication date: 15/07/2018
Field of study

Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more than one possible direct sense, and picking only one of them may give a wrong sense for the word. Moreover, the previous systems use only word forms to represent word senses and their hypernyms. We propose a novel approach that uses the most specific common hypernym of the remaining undisambiguated multi-senses of a word, as well as combined WordNet features to represent word meanings. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 17.7% better than the lexical search, and at least 9.4% better than all surveyed search systems using WordNet. Keywords Ontology, word sense disambiguation, semantic annotation, semantic search.Comment: 6pages, Will be in proceedings of the 5th International Conference on Intelligent Computing and Information Systems (ICICIS-2011), in cooperation with ACM. 30 June to 3 July, 2011, Cairo, Egyp

arXiv.org e-Print Archive

Designing and Implementing Data Warehouse for Agricultural Big Data

Author: Kechadi M-Tahar
Le-Khac Nhien-An
Ngo Vuong M.
Publication venue
Publication date: 29/05/2019
Field of study

In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.Comment: Business intelligent, data warehouse, constellation schema, Big Data, precision agricultur

arXiv.org e-Print Archive

Exploring Combinations of Ontological Features and Keywords for Text Retrieval

Author: Cao Tru H.
Le Khanh C.
Ngo Vuong M.
Publication venue
Publication date: 20/07/2018
Field of study

Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It shows better performance of the proposed models as compared to the keyword-based Lucene, and their advantages for both text retrieval and representation of documents and queries.Comment: 10 pages, will be in PRICAI. arXiv admin note: substantial text overlap with arXiv:1807.0557

arXiv.org e-Print Archive

Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search

Author: Cao Tru H.
Ngo Vuong M.
Publication venue
Publication date: 15/07/2018
Field of study

Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. The traditional constrained spreading activation algorithms use all relations of a node in the network that will add unsuitable information into the query. Meanwhile, we only use relations represented in the query. We propose an ontology-based generalized Vector Space Model to semantic text search. It discovers relevant latent concepts in a query by relation constrained spreading activation. Besides, to represent a word having more than one possible direct sense, it combines the most specific common hypernym of the remaining undisambiguated multi-senses with the form of the word. Experiments on a benchmark dataset in terms of the MAP measure for the retrieval performance show that our model is 41.9% and 29.3% better than the purely keyword-based model and the traditional constrained spreading activation model, respectively.Comment: 9 pages - accpted by the 5th International Joint Conference on Natural Language Processing (IJCNLP-2011). arXiv admin note: text overlap with arXiv:1807.0557

arXiv.org e-Print Archive

An Efficient Data Warehouse for Crop Yield Prediction

Author: Kechadi M-Tahar
Le-Khac Nhien-An
Ngo Vuong M.
Publication venue
Publication date: 26/06/2018
Field of study

Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focuses mainly on how to efficiently forecast and improve crop yield. In recent years, raw and semi-processed agricultural data are usually collected using sensors, robots, satellites, weather stations, farm equipment, farmers and agribusinesses while the Internet of Things (IoT) should deliver the promise of wirelessly connecting objects and devices in the agricultural ecosystem. Agricultural data typically captures information about farming entities and operations. Every farming entity encapsulates an individual farming concept, such as field, crop, seed, soil, temperature, humidity, pest, and weed. Agricultural datasets are spatial, temporal, complex, heterogeneous, non-standardized, and very large. In particular, agricultural data is considered as Big Data in terms of volume, variety, velocity and veracity. Designing and developing a data warehouse for precision agriculture is a key foundation for establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. Some of the requirements for such an agricultural data warehouse are privacy, security, and real-time access among its stakeholders (e.g., farmers, farm equipment manufacturers, agribusinesses, co-operative societies, customers and possibly Government agencies). However, currently there are very few reports in the literature that focus on the design of efficient data warehouses with the view of enabling Agricultural Big Data analysis and data mining. In this paper ...Comment: 12 pages. Keywords. Data warehouse, constellation schema, crop yield prediction, precision agricultur

arXiv.org e-Print Archive

A Generalized Vector Space Model for Ontology-Based Information Retrieval

Author: Cao Tru H.
Ngo Vuong M.
Publication venue
Publication date: 20/07/2018
Field of study

Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.Comment: 5 pages, in Vietnamese. information retrieval, vector space model, ontology, named entity, keyword. Accepted by Vietnamese Journal on Information Technologies and Communication

arXiv.org e-Print Archive